Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.

نویسندگان

  • Lifu Huang
  • Jonathan May
  • Xiaoman Pan
  • Heng Ji
  • Xiang Ren
  • Jiawei Han
  • Lin Zhao
  • James A Hendler
چکیده

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SANE: System for Fine Grained Named Entity Typing on Textual Data

Assignment of fine-grained types to named entities is gaining popularity as one of the major Information Extraction tasks due to its applications in several areas of Natural Language Processing. Existing systems use huge knowledge bases to improve the accuracy of the fine-grained types. We designed and developed SANE, a system that uses Wikipedia categories to fine grain the type of the named e...

متن کامل

Fine-Grained Entity Typing with High-Multiplicity Assignments

As entity type systems become richer and more fine-grained, we expect the number of types assigned to a given entity to increase. However, most fine-grained typing work has focused on datasets that exhibit a low degree of type multiplicity. In this paper, we consider the high-multiplicity regime inherent in data sources such as Wikipedia that have semi-open type systems. We introduce a set-pred...

متن کامل

Corpus-level Fine-grained Entity Typing Using Contextual Information

This paper addresses the problem of corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class such as “food” or “artist”. The application of entity typing we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (...

متن کامل

Finer Grained Entity Typing with TypeNet

We consider the challenging problem of entity typing over an extremely fine grained set of types, wherein a single mention or entity can have many simultaneous and often hierarchically-structured types. Despite the importance of the problem, there is a relative lack of resources in the form of fine-grained, deep type hierarchies aligned to existing knowledge bases. In response, we introduce Typ...

متن کامل

Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction

Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction Xiao Ling Chair of the Supervisory Committee: Professor Daniel S. Weld Computer Science and Engineering With the advent of the Web, textual information has grown at an explosive rate. To digest this enormous amount of data, an automatic solution, Information Extraction (IE), has become necessary. Information extrac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Big data

دوره 5 1  شماره 

صفحات  -

تاریخ انتشار 2017